University of Maryland - AttributeLayoutViz

VAST 2008 Challenge
Mini Challenge 3:  Cell Phone Calls 

Authors and Affiliations:

       Aleks Aris, University of Maryland, College Park  aris@cs.umd.edu  [PRIMARY contact]
      Romain Vuillemot, LIRIS & INSA-Lyon, France
      Ben Shneiderman, University of Maryland, College Park [Faculty advisor]
      (NOTE: the PRIMARY contact needs to be joinable by email for a week after submission and again before the Sept. 1st deadline for camera ready materials)


Student team: YES
       

Tool(s):

NVSS (Network Visualization by Semantic Substrate) is used, developed by Aleks Aris and Ben Shneiderman

at HCIL (the Human-Computer Interaction Lab) at the University of Maryland, College Park since December 2005 until present.

NVSS enables user control of node layout in networks using node attributes.

For more information, please visit project web page: http://www.cs.umd.edu/hcil/nvss

 

 

Two Page Summary:   YES

 

        2-page summary

 

 

ANSWERS:


Phone-1: What is the Catalano/Vidro social network, as reflected in the cell phone call data, at the end of the time period  

   PhoneNodes.txt

   PhoneLinks.txt

   Create your 2 files (using this format), save them as phonenodes.txt and phonelinsk.txt in the same directory as this form and link the above words to those files

 


Phone-2  Characterize the changes in the Catalano/Vidro social structure over the ten day period.

Detailed Answer:

Given the dataset, we realized that there are a few different ways to structure it in terms of nodes and links.

Primarily, we came up with the following different structure of network dataset to be visualized using NVSS.

A)    Nodes represent calls and towers. Links are between calls and towers and represent calls made using the connected tower.

B)    Nodes represent cell phones. Links are calls between cell phones.

There are trade-offs with each approach. (A) can show the data in its finest granularity  (each call is visible as a node on the display) and therefore it allows us to view which calls are made over the ten day period, day by day. However, it results in 10,000 nodes, which substantially slows NVSS down and also makes it harder to separate the one node from another one on the display. In addition, connections between cell phones are not available. A further disadvantage / incompatibility is that NVSS doesn’t support multiple node types (i.e. nodes having different set of attributes): calls and towers. However, NVSS doesn’t also limit the number of attributes nodes have in this respect; therefore, we could simulate multiple node types by combining the attributes of all node types and have this superset of attributes be the node attributes (for every node). This approach circumvents the limitation of NVSS and enables to input the data to NVSS.  On the other hand, (B) does not have this problem of multiple node types as the only node type is the cell phone. Also, it results in only 400 nodes, which allows fast interactive exploration in NVSS. However, the granularity of calls is lost. In addition, NVSS permits only simple graphs; hence, links between two nodes have to be reduced (by aggregating the links between them) to one link. In addition, NVSS currently doesn’t support link attributes; hence, this view would not be able to show any of the call attributes directly, such as duration, day of call, time of day, and tower. We could mitigate not being able to show the tower attribute by introducing towers as a second node type and connect the phone calls to the towers they used; however, even then we would not know the time attribute.

Considering these trade-offs, we chose to start using (A) and visualized the entire network in terms of calls and towers.

Figure 1 shows the first look at the dataset. The top rectangular region has a map background and nodes are the towers located on the map. The bottom region contains the calls organized by day on the x-axis and duration (s) on the y-axis. Nodes show group of calls and are sized by the number of calls they represent. We see a normal distribution where the shorter and longer calls are few, while the middle-length calls are many (and more than the shorter or longer calls). The longest calls (between 2101-2200 seconds) are selected using the duration filter on the Calls region and shows that these calls were made from the 1st and 11th tower on the 3rd day, while from the 24th tower on the 9th day.

 fig01_longest calls

Figure 1 Longest calls occurred on the 3rd and 9th days

 

Looking at calls longer than 1900s, we can see the distribution of the calls to the towers (Figure 2). The figure shows all such calls; there are 24 of them and they are distributed over days 2-10.

 

fig02_calls above 1900s

Figure 2 Calls longer than 1900s over 10 days.

We see that the most used towers for these calls (longer than 1900s) are 30 and 1, 11 and 29, the left of the middle f the island (towers 9, 12, 13, 16) and the upper region of the lower part of the island (towers 20, 21, 22, and 24).

 

Next, we analyzed the quantity of calls over the ten-day period in terms of general area on the map. We divided the map into 4 areas from top to bottom: top-left, top-middle, middle-bottom, and bottom-right. They contain the following towers:

 

First, we focused on the Top-Left area and the calls made the 1st day from this area (Figure 3).

 

fig02_Top-Left calls day1

Figure 3 Focusing on the top-left region, analyzing the calls by tower on the 1st day

 

191 of the total 1765 calls made from this area have been done in the 1st day. (Total number of calls made from all areas is 9830.) The most active region in the area seems to be the leftmost where towers 1 and 30 are located, and next is where the tower 6 is located. Also, no longest calls are made in this 1st day from this area.

 

Next, we browsed days 2-10 to see the number of calls on each day (Figure 4).

On the 2nd day, the left of the area (towers 1 & 30) increase in activity, while there is also increased activity at the top of the area (towers 4 & 6).

On the 3rd day, activity somewhat drops in all areas and a bit increased on the 4th day.

On the 5th day, activity increases on the left of the area, namely towers 1 & 30, while the 6th day a peek is experienced on tower 1. On the following 2 days, it somewhat drops and on the 10th day it increases a bit.

fig03_Top-Left calls day2-10

Figure 4 Focusing on the top-left region, analyzing the calls by tower on days 2-10 (top: 2,3,4, middle: 5,6,7, bottom: 8,9,10)

 

Next, we looked at the call volume in the middle-bottom area over 10 days (Figure 5).

It looks that tower 22 is used for the first 5-6 days but less on the following 4-5 days.

 

fig06_MiddleBottom calls day1-10

Figure 5 Call volume in the middle-bottom area over 10 days.

We have also looked at the bottom-right region for the call volume and detected no interesting patterns. It has been active over all 10 days with no noticeable differences.

 

Next, due to multitude of calls, we divided them to 5 categories in terms of their duration with boundaries of 400s, 800s, 1200s, 1600s, 2200s. Figure 6 shows the short calls (0-400s) on the left and the long calls (1600-2200s) on the right. While the top-middle region is active both in short and long calls, the top-left region is inactive in both except tower 30 and towers 1. Tower 30 is active in both, and tower 1 in long calls.

 

fig06_callsByDuration

Figure 6 Contrasting short and long calls in terms of location of tower they use over the entire 10-day period.